Computing Iceberg Queries Eeciently Paper Number 234

نویسندگان

  • Min Fang
  • Narayanan Shivakumar
  • Hector Garcia-Molina
  • Rajeev Motwani
چکیده

Many applications compute aggregate functions (such as COUNT, SUM) over an attribute (or set of attributes) to nd aggregate values above some speci ed threshold. We call such queries iceberg queries because the number of above-threshold results is often very small (the tip of an iceberg), relative to the large amount of input data (the iceberg). Such iceberg queries are common in many applications, including data warehousing, information-retrieval, market basket analysis in data mining, clustering and copy detection. We propose e cient algorithms to evaluate iceberg queries using very little memory and signi cantly fewer passes over data, as compared to current techniques that use sorting or hashing. We present an experimental case study using over three gigabytes of Web data to illustrate the savings obtained by our algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Computing Iceberg Queries Eeciently

Many applications compute aggregate functions over an attribute (or set of attributes) to nd aggregate values above some speci ed threshold. We call such queries iceberg queries, because the number of abovethreshold results is often very small (the tip of an iceberg), relative to the large amount of input data (the iceberg). Such iceberg queries are common in many applications, including data w...

متن کامل

Computing Iceberg Queries Efficiently

Many applications compute aggregate functions over an attribute (or set of attributes) to find aggregate values above some specified threshold. We call such queries iceberg queries, because the number of abovethreshold results is often very small (the tip of an iceberg), relative to the large amount of input data (the iceberg). Such iceberg queries are common in many applications, including dat...

متن کامل

Partitioning based algorithms for approximate and exact Iceberg Queries

In many applications it is necessary to identify items which occur frequently within the data set which may be a materialized or non materialized relation Such queries were recently denoted as iceberg queries Several algorithms for computing iceberg queries were presented including an approximation algorithm based on concise sampling and an exact algorithm based on sampling combined with multip...

متن کامل

Efficient Computing of Iceberg Queries Using Quantiling

Iceberg queries have been recently identified as important queries for many applications. These queries can be characterized by their huge input-small output. The iceberg refers to the input, and the tip of it refers to the output. We present an efficient algorithm for computing an important class of iceberg queries. This algorithm uses a focusing technique for the query result using quantiling...

متن کامل

Methods for Evaluating Iceberg Queries

Iceberg queries are a special case of SQL queries involving GROUP BY and HAVING clauses, wherein the answer set is small relative to the database size. Iceberg queries have been recently identified as important queries for many applications. Queries can be characterized by their huge input-small output. The iceberg refers to the input, and the tip of it refers to the output. This paper is going...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998